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Identifying rare, highly penetrant risk mutations may be an important step in dissecting the molecular etiology of 
schizophrenia. We conducted a gene-based analysis of large (>1 00 kb), rare copy-number variants (CNVs) in the 
Wellcome Trust Case Control Consortium 2 (WTCCC2) schizophrenia sample of 1 564 cases and 1 748 controls all 
from Ireland, and further extended the analysis to include an additional 51 96 UK controls. We found association 
with duplications at chr20p12.2 (P = 0.007) and evidence of replication in large independent European schizo- 
phrenia (P = 0.052) and UK bipolar disorder case-control cohorts (P = 0.047). A combined analysis of Irish/UK 
subjects including additional psychosis cases (schizophrenia and bipolar disorder) identified 22 carriers in 
11 707 cases and 10 carriers in 21 204 controls [meta-analysis Cochran-Mantel-Haenszel P-value = 2 x 
10 4 ; odds ratio (OR) = 11.3, 95% CI = 3.7, oo]. Nineteen of the 22 cases and 8 of the 10 controls carried duplica- 
tions starting at 9.68 Mb with similar breakpoints across samples. By haplotype analysis and sequencing, we 
identified a tandem ~149 kb duplication overlapping the gene p21 Protein-Activated Kinase 7(PAK7, also 
called PAK5) which was in linkage disequilibrium with local haplotypes (P = 2.5 x 10 21 ), indicative of a 
singleancestral duplication event. We confirmed the breakpoints in 8/8 carriers tested and found co-segregation 
of the duplication with illness in two additional family members of one of the affected probands. We demonstrate 
that PAK7\s developmental^ co-expressed with another known psychosis risk gene {DISC1) suggesting a po- 
tential molecular mechanism involving aberrant synapse development and plasticity. 



INTRODUCTION 

Schizophrenia [MIM 181500] is a poorly understood, but severe, 
heritable mental disorder with a lifetime risk of ~ 1 %. The emer- 
ging genetic architecture includes a spectrum of risk variation 
from rare mutations of large effect, to common risk variants of 
small effect [odds ratio (OR) < 1.15] which collectively account 
for at least 25% of susceptibility (1-3). Through genome-wide as- 
sociation study (GWAS) and subsequent meta-analysis, more 
than 20 independent common loci have been confirmed, but 
there are likely to be many more (4-6). A much smaller number 
of rare mutations of moderate or large effect have been identified, 
but these will be particularly important in facilitating dissection of 
the risk phenotype in model systems (7). 

The list of rare, highly penetrant schizophrenia mutations 
(OR = 2-30) includes recurrent de novo copy-number variants 
(CNVs) involving deletions or duplications of large, genomic 
regions (> 100 kb) (8,9), but also the accumulation of different 
CNV events at specific loci implicating single genes (NRXN1 
[MIM 600565] (10-12) and VIPR2 [MIM 601970]) (13,14). 
Almost all of the confirmed CNVs are also risk factors for other 
psychiatric or developmental phenotypes [e.g. intellectual disabil- 
ity, attention deficit hyperactivity disorder (ADHD), autism] 
(15,16). This is in keeping with epidemiological and GWAS 
data supporting shared genetic liability between schizophrenia 
and other psychiatric disorders, in particular bipolar disorder 
(17-19). As current findings explain a modest proportion of 
total schizophrenia susceptibility, expanding the number of risk 
mutations will be important in understanding molecular etiology 
but also the relationships between these clinical disorders (20). 

We report a gene-based analysis of large (> 100 kb), rare 
[ < 1 % minor allele frequency (MAF)] CNVs in the Wellcome 
Trust Case Control Consortium 2 (WTCCC2) schizophrenia 



sample of 1564 cases and 1748 controls, all from Ireland. We 
find evidence of association with duplications at 20pl2.2 and 
further support in a large independent European sample. All of 
the carriers were from the British Isles and an extended analysis 
including 32 91 1 Irish/UK subjects provided further association 
support. By haplotype analysis and sequencing, we show that the 
CNV is not the result of repeated de novo events and is an inher- 
ited risk mutation potentially inherited from a single European 
ancestor. This accords with emerging evidence for rare CNVs 
that are population-specific and associated with substantial 
illness risk (21,22), consistent with the 'Clan Genomics' 
concept (23). The mutation involves a tandem duplication of 
148 951 bp at chr20:9,684,767-9,833,717(hgl8) overlapping 
the gene p21 Protein-Activated Kinase 7 (PAK7, also known 
as PAK5). Other PAK family members modulate synaptic 
network development through a signaling pathway regulated 
by the schizophrenia risk gene DISCI (24). We demonstrate 
that PAK7 is co-expressed with DISCI in developing brain and 
further investigation of the role of the mutation in synaptic 
mechanisms salient to schizophrenia is warranted. 

RESULTS 

Discovery phase evidence of association at four loci 

We observed evidence of association (P < 0.05) at four loci in 
the Irish discovery dataset of 1564 cases and 1748 controls; 
20pl2.2(PAK7 [MIM 608038]); 2cen-ql3 (ANKRD36B, 
COX5B [MIM 123866]; ACTR1B [MIM 605144]); 3p25.1 
(MRPS25 [MIM 61 1987], ZFYVE20 [MIM 6095 1 1]) and a pre- 
viously confirmed schizophrenia risk locus [chrlq.21 (CHDIL 
[MIM 613039]) locus (8,9)]. Association results for the three 
novel loci are presented in Table 1 . To improve our estimate of 
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the population frequency of what are rare events, we examined 
the three novel loci in an extended control sample (n = 6944) in- 
cluding UK controls from WTCCC2 [2533 controls from the UK 
National Blood Service (NBS) and 2663 from the UK 1 958 Birth 
Cohort (58BC)]. We identified chrlq21.1 risk duplications in 
four cases, chromosome 15ql3.3 deletions in three cases and a 
single case with the large 22qll.2 deletion with no similar 
events seen in controls from the discovery analysis. Details of 
our findings at all previously reported schizophrenia CNV risk 
loci, with locus coordinates, are provided in Supplementary 
Material, Table S3a. Further, we provide details of all large 
(>100kb), rare CNVs identified in our discovery sample in 
cases and controls (Supplementary Material, Table S3b). 



Replication phase association evidence at chr20pl2.2 

Three loci were carried forward for replication (Table 1) in the 
independent International Schizophrenia Consortium (ISC) 
dataset of 3 1 1 1 cases and 2267 controls from the UK, Portugal, 
Sweden and Bulgaria (8). Only the chr20pl2.2 locus containing 
p21 Protein-Activated Kinase 7 (PAK7, also known as PAK5) 
showed evidence for association in the replication sample, 
with six duplication events, all in UK cases, and none in controls 
(P = 0.052). Because this is a rare event, and absent in the 
control population, the reported association result was the 
most significant P- value obtainable without a larger sample size. 

As all of the carriers were identified in samples from the 
British Isles, we investigated the locus for additional evidence 
of association in an independent sample of bipolar disorder in 
the WTCCC1 (n = 1697) (25) and University College London 
(UCL) (n = 546) samples (26). The controls included independ- 
ent UK controls from the UCL study (n = 510) and 10259cases 
ascertained for nonpsychiatric disorders in WTCCC 1 . We iden- 
tified four duplication carriers in cases and five carriers in the 
control sample, provided further nominal evidence of associ- 
ation (P = 0.047). 

We performed a combined analysis of all Irish/UK subjects in- 
cluding additional psychosis cases from the CLOZUK study (27) 
(n = 6223) and independent controls from the WTCCC2 UK 
control sample (n = 7703) (28,29). Combining the evidence 
across studies, under the assumption of the same effect on risk 
in each strata, gave a meta-analysis Cochran-Mantel-Haenszel 
(CMH) P-value = 2 x 10" 4 (OR = 1 1.3, 95% CI = 3.7, oo) 
with 22 carriers in 1 1 707 psychosis cases and 10 earners in 21 
204 controls. 



Evidence for a single founder risk mutation at PAK7 

Based on the array data, we examined the inferred breakpoints 
for the chr20pl2.2 duplications, and the estimated start and 
stop coordinates for the 32 duplication carriers (Fig. 1 , Supple- 
mentary Material, Table S5). Nineteen of the 22 duplication car- 
riers in the cases and 8 of 10 carriers from the controls exhibited 
duplications of 132-146.5 kb with very similar start positions 
(~9.68 Mb), apparently similar breakpoints across samples 
and no evidence of flanked segmental duplications. The evi- 
dence for association at the locus in the Irish/UK sample 
appeared to be driven by these duplication cases. The remaining 
five individuals carried duplications occurring on different 
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Figure 1. Inferred position of duplication events at the chr20p 12.2 locus in build hg 1 8 . SZ, schizophrenia. BD, bipolar disorder. See Supplementary Material, Table S5 
for exact start/stop coordinates. Duplications in controls are colored light blue and SZ or BP cases are colored dark blue. 



chr20: 9,833,717 



15bp 
insertion 



1 1 M l 1 1 1 II 1 1 1 1 1 H I I 1 1 1 1 1 1 1 1 1 1 H I 

3AA A ATGAGA GC TC A Q^SlA TGA C AGT T C TG C 4 G 



chr20: 9,684,767 



UlhMllHllMllll 

J>ATTAAATCCCAAACTCAT 




Figure 2. Capillary sequence of the unique region in the middle of the two copies of the tandem repeat sequence at the PAK7 duplication. The first copy of the sequence 
ends at chr20:9,833,717(hgl8) and the second copy starts with sequence from chr20:9,684,767(hgl8), thereby defining the breakpoints. A 15 bp sequence is present 
between the two copies. 



haplotype backgrounds and are likely to represent different 
molecular events. 

To explore whether this could represent a single ancestral mu- 
tation event, we examined haplotype sharing in the five Irish case 
samples in linkage disequilibrium (LD) blocks (termed Hap 
Block 1 and Hap Block 2) immediately 5' and 3' of the inferred 
duplication. Haplotypes were phased in PLINK with a probabil- 
ity >0.98. All five Irish cases carried a copy of the same haplo- 
type in both Hap Block 1 (CCTT, /= 0.188) and Hap Block 2 
(TCAA, /= 0.208) (Supplementary Material, Tables S6 and 
S7). We extended the analysis to include 1 8 of the 22 UK carriers 
(as these had been genotyped on the same Affymetrix platform) 
and found that all carried copies of the CCTT and TCAA haplo- 
types flanking the duplication (Supplementary Material, 
Table S8). Given the low level of LD between these blocks 
(D' = 0.22, r 2 = 0.02) the probability of making these observa- 
tions if the duplication was the result of repeated de novo muta- 
tions is extremely low (P = 2.5 x 10 2 ), providing support for 
a single ancestral duplication event. We did not find an increased 
level of relatedness among the Irish PAK7 duplication carriers 



compared with the level of relatedness of random Irish indivi- 
duals, arguing against a very recent event (see Supplementary 
Material, Fig. SI). 

Next, we confirmed the duplication event in one individual 
(IRL_101) using an Agilent custom designed comparative 
genomic hybridization (CGH) microarray with high probe 
density in the PAK7 region. The breakpoints identified by CGH 
(chr20:9,684,902-9,833,151) map closely to those predicted by 
the SNP array analyses (chr20:9,685,41 3 -9,83 1 ,947; Supplemen- 
tary Material, Fig. S2).The event overlaps exon 1 and enhancer and 
promoter regions adjacent to exon 1 in all four known PAK7 
transcripts, and exon 2 in two alternative transcripts and is 
absent from the Database of Genomic Variants (http: //projects, 
tcag.ca/variation/). By capillary sequencing this individual, we 
identified a tandem duplication of 148 951 bp at chr20:9,684, 
767-9, 833,717(hgl8) with a 15 bp sequence inserted between 
the first and second copies of the repeated sequence (Fig. 2). By se- 
quencing the five remaining Irish carriers and two of the UK car- 
riers (SCOT_101, SCOT_201), we confirmed that all shared 
exactly the same breakpoints with the same 1 5 bp sequence 
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Figure 3. Familial pedigrees for patients from three families (IRL_6, IRL_1, 
SC0T_1 ) who were found to be positive for PAK7 duplications. The index indi- 
vidual in each family screened is indicated with an arrow. Individuals positive 
(PAK7 + VE) or negative (PAK7 - VE) for PAK7 are indicated. Filled 
symbols indicate individuals with psychotic disorders (SCZ, schizophrenia; 
SCA, schizoaffective disorder; BP A, bipolar affective disorder; PDNOS, psych- 
otic disorder not otherwise specified). Partial vertically shaded symbol indicates 
depressive disorder (DEP, depression). Partial horizontal shaded symbol indi- 
cates learning disability (LD). Cause of death where known: SUIC, suicide; 
BA, birth asphyxia; CT, cerebral tumor. 

inserted between the two copies of the repeat sequence, suggesting 
that this represents the same duplication event. 

Assessment of additional family members of carriers 

We were able to access additional family members of two cases to 
test for evidence of co-segregation of the duplication with illness 
(see Fig. 3). In the family of IRL-101, there were three other sur- 
viving siblings, none of whom had a diagnosis of schizophrenia or 
a related major mental disorder. One sister (IRL_102) agreed to be 
tested. She had experienced a depressive episode after bereave- 
ment and did not carry the duplication. By going back to the origin- 
al G WAS data, we identified another Irish case (IRL_60 1 ) that had 
been excluded from the primary analysis because they are related 
to IRL_1 0 1 based on a proportion of identity-by descent estimated 
at 0.033. IRL_601 is a female patient with a history of Schizo- 
affective Disorder. She is the youngest in a sibship of five children 
of which four were affected with a major psychotic disorder. We 
confirmed that two affected brothers (IRL_603 (bipolar I dis- 
order), IRL_604 (psychotic disorder, not otherwise specified)) 
also carry the PAK7 duplication. The remaining affected brother 
[IRL_602 (bipolar I disorder)] died by suicide. We tested one of 
the eight offspring of IRL_601, and this unaffected individual 
did not carry the duplication. In the SCOT_l family, we confirmed 
that the duplication was inherited from the mother (SCOT_103). 
Neither of the parents was affected clinically, but there was a 
family history of depression in both families, and a brother with 
a history of psychosis did not carry the duplication (SCOT_104). 

Is the mutation present in other European ancestry 
populations? 

To assess whether the mutation was exclusive to the British Isles 
or present elsewhere in Europe, we examined additional 



European populations. Details of the cohorts analyzed are 
included in Supplementary Material, Table SI. The estimated 
control frequency for the mutation in the UK/Irish population 
is low (f = 0.037%) and most of these cohorts were underpow- 
ered to provide an accurate frequency estimate for what is a 
rare event. We failed to identify any carriers in large Icelandic 
(>72 000 individuals) and Finnish samples (~3800 indivi- 
duals), but did find carriers in populations from the Eastern 
USA, the Netherlands, Sweden and Denmark. We were able to 
unambiguously phase haplotypes in five US control carriers 
with available Affymetrix data, confirming that all of these indi- 
viduals shared the same haplotype background as the Irish/UK 
cases, suggesting that they carry the same mutational event al- 
though the samples were not available for confirmatory sequen- 
cing. The other European samples were genotyped on a different 
array platform and where we could definitively phase haplo- 
types; this confirmed the same haplotype background. These 
data suggest that the mutation is present in other European popu- 
lations. The limited number of cases available precluded formal 
association analyses, but we identified four additional cases with 
the duplication in Dutch, Danish and Swedish cohorts. Informa- 
tion on the clinical characteristics, illness course, co-morbidity 
and family history for all mutation case carriers is provided in 
Supplementary Material, Table S9. 

Functional investigation of PAK7 

The p2 1 -activated kinases (PAKs) are a family of serine/threo- 
nine protein kinases, which are regulated by the Rho family of 
small G proteins and are involved in multiple intracellular sig- 
naling pathways. Six PAK genes are expressed in human and 
based on their regulatory functions are classified into Group I 
(PAK 1-3) and Group II (PAK4-6) members (30). Group I 
PAKs are activated by RAC-PAK signaling to promote axon 
connectivity, and synapse formation, in the developing brain 
in a pathway regulated by another schizophrenia risk gene 
DISCI [MIM:60521] (31). Aberrant synaptic network develop- 
ment and maintenance represents a plausible molecular mechan- 
ism for psychosis susceptibility and mutations involving other 
Group I PAKs (PAK2 (the 3q29 microdeletion syndrome locus 
[MIM 609425]) and PAK3 [MIM:300558]) are known to be 
associated with neurodevelopmental syndromes characterized 
by psychosis (32). PAK7 is a brain-specific isoform within the 
cell it is localized to filipodia, where it has been shown to 
promote the induction of neurite outgrowth, filopodium forma- 
tion and synaptic vesicle trafficking (33). The Group II PAKs, in- 
cluding PAK7, differ structurally from Group I members and 
their mechanism of activation requires clarification. 

To test whether PAK7 may be under similar regulatory control 
to Group I PAKs during early brain development, we investi- 
gated the relationship between PAK7 and DISCI. Examining 
Brainspan data (brainspan.org), we identify significant negative 
correlation between DISCI and PAK7 expression in whole 
human brain throughout life (r 2 = — 0.297 to 0.867). The stron- 
gest correlations are for the first and last trimester of pregnancy 
and the first 5 years of life, with the level of correlation falling 
through adulthood. In a co-imrnunoprecipitation experiment, 
we confirmed interaction between PAK7 and DISCI in synapto- 
neurosomal preparations from full mouse brain at postnatal day 
8-10 [(P8-10); Fig. 4A]. Next, we ran the reciprocal 
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Figure 4. PAK7 and DISC 1 interact in mouse brain synapses. (A) Immunoprecipitation of PAK7 and DISC 1 from synaptoneurosomal extracts of adult mouse brains. 
The extract was immunoprecipitated for PAK7 and the eluate was immunostained for DISC 1 . (B) Reciprocal experiment described in A: immunoprecipitation of 
DISCI followed by immunostaining of the eluate with antiPAK.7. In both A and B, the samples immunoprecipitated with control IgG showed no signal, showing 
the specificity of the PAK7-DISC1 interaction. An additional control in (C) shows that the brain extracts before immunoprecipitation have the same amount of 
loading control ((i-actin). (D-N) Immunostaining for PAK7 (green) and DISCI (red) in brain sections containing Hippocampus and Cortex of P7 (D— I) and 
Adult (J-O) mice. The experiment shows an overlapping (yellow) of PAK7 and DISCI in puncta, confirming a synaptic interaction between the two proteins. 
The co-localization is particularly evident in the adult (J-O) preparation. 



experiment and demonstrated immunoprecipitation of DISCI 
followed by immunostaining of the eluate with antiPAK7 
(Fig. 4B). Finally, we explored the regional and temporal evi- 
dence for co-expression in brain regions implicated in schizo- 
phrenia etiology (hippocampus and cortex). In this analysis, 
we co-immunostained PAK7 and DISCI in P7 and adult 
(>P60) mice. We find clear overlap in perisomal puncta, 
which confirms that the two proteins interact at the synapse, par- 
ticularly in adult mouse brain (Fig. 4D-0). This suggests that 
PAK7 is developmentally co-expressed with DISCI and may 
have a specific functional role at the synapse. 

DISCUSSION 

We report evidence that rare, chr20pl2.2 duplications increase 
risk for the major psychotic disorders, schizophrenia and 
bipolar disorder. Previous schizophrenia studies have identified 
at least eight rare CNVs of strong effect (genotypic relative risks 
of 4-20) (5). Most of these CNVs have been identified because 
they recur in the population, either as relatively large events im- 
plicating many genes and arising from nonallelic homologous 
recombination (NAHR), or through the accumulation of 



different, generally de novo mutations at a single locus (e.g. 
NRXN1, VIPR2). At the PAK7 locus, we identified a single 
founder event with a 15 bp sequence inserted between the two 
copies of the tandem duplication. From the 1000 Genomes 
Project data, most tandem duplications of this type are formed 
by fork stalling and template switching as detailed in the 
micro-homology-mediated break-induced replication model 
(34,35). 

We identified 2653 genes affected by a CNV (> 100 kb but 
<10Mb) in at least one individual in our study. However, 
only 382 genes had the four or more overlapping CNVs required 
to achieve P < 0.05 in our discovery sample. Based on our data, 
the overall contribution of CNVs to psychosis susceptibility risk 
is difficult to estimate as we could not exclude rarer or smaller 
events. Ongoing CNV analysis of the Psychiatric Genomics 
Consortium dataset will be important in more comprehensively 
investigating rare, large CNV events (17). 

By haplotype analysis and sequencing, we established that the 
association is driven by a ~ 149 kb duplication at the gene PAK7 
with a likely common European ancestor. Inherited mutations 
have been described contributing to other neurological 
disorders, but their relative contribution to schizophrenia 
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susceptibility is unknown (2 1 ,22,36). Such events are likely to be 
recent in origin and differentiated between populations making 
them difficult to replicate across diverse collections using stand- 
ard gene association methods in large multicohort CNV studies 
(37-39). Although a single variant is unlikely to influence the 
overall population risk for a disorder, founder risk mutations 
can be identified through association analysis in specific popula- 
tions. By examining cases excluded, through relatedness QC, 
from our original Irish GWAS dataset, we identified an addition- 
al duplication carrier related at a distance of five meioses to one 
of the index cases. Within the family of this additional carrier, we 
found evidence of co-segregation of the duplication with psych- 
osis in two affected siblings. Identifying cryptic relationships 
between individuals in available GWAS datasets may be 
helpful in identifying extended pedigrees for analysis as a com- 
plementary approach to existing association methods. 

The risk duplication identified in this study overlaps the first 
two exons and enhancer and promoters regions in the known 
transcripts of PAK7. Because PAK7 is exclusively brain 
expressed, the impact of the duplication on gene expression 
and the genetic mechanism involved are yet to be determined. 
This is important as we identified other larger duplication 
events at the locus in two control samples, but do not know if 
these would have the same or different effects on gene function. 
From existing animal models of PAK7, we do know that knock- 
out mice are viable with no obvious developmental abnormal- 
ities but PAK6/PAK7 double knockout mice show behavioral 
and learning deficits suggesting functional redundancy 
between these isoforms (40). Genomic studies in carriers of 
the founder mutation will be important as additional risk variants 
or mutations (e.g. mDISCl orPAK6)maybe contributing to risk 
in this extended pedigree. 

The PAK7 risk mutation is rare, but based on data from the 
British Isles, we estimate the exposed attributable risk for car- 
riers is likely to be substantial (41). We attempted to date the 
origin of the duplication by examining haplotype sharing in 
this genomic region. After characterizing the haplotype patterns 
in the region, we were unable to make robust inference: further 
family data will be required to accurately phase haplotypes for 
this rare event. 

Almost all of the risk CNVs identified to date in the schizo- 
phrenia literature also increase susceptibility to other develop- 
mental phenotypes including intellectual disability, autism and 
seizure disorder. Only four of our cases received a full cognitive 
assessment (including IQ measurement), and all were assessed 
as being within the normal range of cognitive function. One in- 
dividual had a history of co-morbid language delay and 
another had a history of seizure disorder. Although most of the 
cases had schizophrenia, seven had bipolar disorder or schizo- 
affective disorder. This is in keeping with evidence of shared 
genetic etiology between these disorders, although the evidence 
for CNV involvement in bipolar disorder has until now been 
more equivocal (42,43). Having identified carriers of the dupli- 
cation in control cohorts, we were interested in testing whether 
'control' individuals had psychiatric or developmental pheno- 
types. In almost all studies (including the individual WTCCC 1 
studies), controls were not screened for mental disorder and 
little if any developmental information was available. Within 
the UK and Ireland health biobank, data are now becoming avail- 
able on a sufficiently large scale to allow us to directly examine 



the phenotypic implications of carrying rare events in the general 
population and this is one of future goals (e.g. The Generation 
Scotland Family Health Study, UK Biobank and 'Growing Up 
in Ireland' Study). Further clinical and genetic assessment of 
patient carriers and their families will also be important in quan- 
tifying the penetrance and range of phenotypic expression of the 
PAK7 duplication. 

Schizophrenia is a neurodevelopmental disorder and a disease 
model involving aberrant synaptic network development has 
been proposed (44). Group I PAKs play a role in the development 
of these networks through a signaling pathway regulated by an 
established susceptibility gene (DISCI) (45). PAK7 is a brain- 
specific isoform, within the cell it is localized to filipodia, 
where it has been shown to promote the induction of neurite out- 
growth, filopodium formation and synaptic vesicle trafficking 
(33). To clarify its mechanism of activation, we performed func- 
tional studies that confirmed interaction between PAK7 and 
DISCI , suggesting that this gene may be under similar regulatory 
control to other PAK family members. Furthermore, this inter- 
action occurs at the synapse and is particularly evidence in 
adult mouse brain suggesting a role in synaptic plasticity. 

In conclusion, we have identified a psychosis risk duplication 
at the gene PAK7, with evidence that it is inherited from a 
common ancestor. The gene functions in the development and 
maintenance of synaptic networks and may be under the regula- 
tory control of DISCI . Many of the cases segregating the DISC 1 
translocation in the original Scottish schizophrenia pedigree 
were affected with schizoaffective disorder or mood disorders 
as was the case in the largest PAK7 pedigree we investigated, 
and we also found evidence for association with bipolar disorder 
(46). This suggests a broader molecular risk mechanism for 
psychosis. A critical next step will be to understand how the du- 
plication impacts on gene expression and function so that it can 
be investigated in model systems. As PAK7 is exclusively 
brain-expressed, this will require further experimental work. 



MATERIALS AND METHODS 

Study subjects 

The discovery sample included 1564 cases and 1748 controls 
from the Irish Schizophrenia Genomics Consortium/WTCCC2 
GWAS study which has previously been described (47). Partici- 
pants, from the Republic of Ireland or Northern Ireland, were 
interviewed using a structured clinical interview and diagnosis 
of schizophrenia (n = 1418) or a related disorder [schizoaffect- 
ive disorder (« = 182); schizophreniform disorder (n = 6)] was 
made by the consensus lifetime best estimate method using 
DSM-IV criteria. Control subjects were ascertained with 
written informed consent from the Irish GeneBank and repre- 
sented blood donors from the Irish Blood Transfusion Service. 
Cases and controls met the same ethnicity criteria (Irish origin 
with all four grandparents bom in Ireland or the UK). To 
improve our estimate of the population frequency of what are 
rare events, we examined nominally significant loci in an 
extended control sample (n = 6944) including UK controls 
from WTCCC2 [2533 controls from the UK National Blood 
Service (NBS) and 2663 from the UK 1958 Birth Cohort 
(58BC)] (28). 
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The first phase of replication examined nonoverlapping sub- 
jects from the ISC dataset (8). To follow-up evidence of associ- 
ation in the British Isles, we performed further association 
analysis in bipolar disorder cases and controls from the UK/ 
Ireland. The WTCCC1 bipolar disorder sample included 1697 
cases recruited throughout the UK. Most of the cases had a diag- 
nosis of bipolar I disorder (71%) or schizoaffective disorder 
(15%). Controls were nonpsychiatric disease case participants 
in the WTCCC1 study (n = 10 259) (25). The UCL bipolar dis- 
order sample included 546 individuals (97% bipolar I disorder) 
and comparison subjects (n = 510) with no personal or first- 
degree history of any mental disorder and from a similar UK or 
Irish ancestry, based on the origin of all four grandparents (26). 

Finally, in an extended analysis, we included additional 
psychosis cases and controls from the UK. The additional psych- 
osis cases were of Caucasian origin and came from the CLOZUK 
sample (« = 6223) (27). The CLOZUK sample consists of 
patients taking the antipsychotic medication Clozapine, a drug 
reserved for the treatment-resistant psychosis patients in the 
UK. The remaining control subjects were ascertained as 
described in previous WTCCC2 studies from the UK National 
Blood Service (NBS) (n = 2533); the UK 1958 Birth Cohort 
(n = 2663); the People of the British Isles (POBI) study and 
from (28,29). We also investigated additional European ancestry 
cohorts for evidence of the PAK7 duplication and details on the 
samples tested, case and/or control numbers and genotyping 
platform are provided in Supplementary Material, Table SI 
(5,9,48,49). 

CNV calling and validation 

We report a schizophrenia study investigating large (> 100 kb), 
rare (<1% MAF in all samples) CNVs using data from the 
WTCCC2. All discovery samples were genotyped using the 
Affymetrix 6.0 platform either at the Affymetrix (Santa Clara, 
CA, USA) or at the Broad Institute (Cambridge, MA, USA) la- 
boratories. Samples were processed using the WTCCC2 pipeline 
and quality control details have been reported previously 
(27,47). 

CNV calls were created using Birdseye from Birdsuite 
(version 1.5.5) (50) for autosomes only, and we excluded calls 
where lengths were < 100 kb or > 10 Mb, or LOD score < 10. 
We excluded CNVs with at least 50% overlap with a region 
copy-number variable in at least 1% of samples; individuals 
with >30 CNV calls or a total event length > 10 Mb and calls 
for samples from plates containing fewer than 40 samples (Sup- 
plementary Material, Table S2). Calls with a copy number of 0 or 
1 were considered to be deletions and calls with a copy number of 
3 or 4 were considered duplications. 

For the replication analysis, the ISC data were called using the 
same calling protocol as in the discovery analysis. Details on 
genotyping platforms used for the other samples are provided 
in Supplementary Material, Table SI. For the identified PAK7 
risk duplication, we performed standard qPCR validation of 
the CNV calls in the discovery sample (Supplementary Material, 
Table S4). Verifying the presence of CNVs by this method 
addresses sensitivity, but not the specificity of CNV calling. 
To test the specificity and sensitivity of calling across genotyp- 
ing platforms, we examined probe intensity data for 6542 
control subjects who had been genotyped on Affymetrix 6.0 



and Illumina 1 .2M-Duo arrays as part of the wider WTCCC2 
study. We called CNVs on the Illumina platform using Quan- 
tiSNP (51) and excluded those with log Bayes factor <10, 
length < 100 kb or > 10 Mb. We excluded samples with more 
than 10 CNVs in total, with more than 10 Mb of total CNV 
length, or failed SNP quality control. This identified the same 
four individuals as the only PAK7 duplication carriers across dif- 
ferent arrays and calling algorithms. For the additional European 
ancestry cohorts, CNV calls were made as described in the 
primary publications [see Supplementary Material, Table SI 
and references (5,9,48,49)]. The CLOZUK calling method is 
described in Guha et al. (2013) (52). 



CNV analysis 

We conducted a gene-based analysis using gene boundaries from 
UCSC Genome Browser refGene (hgl8) and identified 2653 
genes impacted (from the transcription start to end point) by a 
CNV (> 100 kb but < 10 Mb) in at least one individual. Three 
hundred eighty-two genes had at least four overlapping CNVs 
(required to achieve P < 0.05) in the discovery data, represent- 
ing a smaller number of loci, as some CNVs overlap multiple 
genes. Fisher's exact tests were used to calculate P- values in 
the discovery analysis and a fixed-effects CMH test to assess 
the evidence across replication cohorts. 



Haplotype analysis 

We analyzed carriers of the 'common' PAK7 duplication at 
chr20:9,685,4 1 3 -9,83 1 ,947, which is carried by 27 of 32 indivi- 
duals with duplications overlapping the PAK7 locus in the 
British Isles samples. Haplotype analysis used a core set of Affy- 
metrix SNPs that had been genotyped in 23 of the samples and all 
genotype data were converted to the forward strand. The four 
samples omitted were from the CLOZUK sample genotyped 
on Illumina where no suitable proxy SNPs were available for 
analysis. From HapMap data, the start of the duplication sits in 
a haplotype block that extends from rs2423462 (9682770) to 
rs2423467 (9689876). The part of this block that is outside the 
region, termed Hap Block 1 is from rs2423462 (9682770) to 
rs742450 (9684493). The duplication ends in a region between 
haplotype blocks. Immediately, 3' of the duplication is a haplo- 
type block that extends from rs6057009 (9840327) to rs6 1 1 88 1 9 
(9867572) (Hap Block 2). We have focused on haplotypes 
outside the duplication region, to analyze diploid genotypes, 
and limited the analysis to regions of high LD where haplotypes 
could be phased. In the absence of family data, extended haplo- 
types outside these local blocks could not be accurately phased. 
Haplotype frequencies and phased haplotypes, with a probabil- 
ity >0.98, were estimated using the -hap and -hap-phase 
functions in Plink (see Supplementary Material, Tables S6 and 
S7). For either Hap Block, the chances that all 23 samples 
would carry at least one copy of a specific haplotype, i.e. be 
either heterozygous or homozygous for this haplotype q = [q 2 
(probability of homozygous q) + 2*<7*(1 — q) (probability of 
homozygous q)] 23 = (2q — q ) 23 where q is the haplotype fre- 
quency. This probability was calculated for both Hap Blocks 
and in the absence of LD between the blocks, these numbers 
were multiplied together to determine the combined probability 
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of all 23 carriers carry copies of the same haplotypes either side 
of the duplication. 

CNV breakpoint sequencing 

Using an Agilent custom designed CGH microarray with high 
probe density in the PAK7 region, we confirmed the duplication 
event in one individual and the breakpoints identified by CGH 
(chr20:9,684,902-9,833,151) map closely to those predicted 
by the SNP array analyses (chr20:9,685,413-9,831,947; Sup- 
plementary Material, Fig. S4). Based on these estimated break- 
points and assuming that this is a tandem duplication that is 
not inverted, we attempted to sequence from one copy of the du- 
plication into the second copy in order to identify the precise 
breakpoints. Using a forward primer positioned at the end of 
the duplication (TCTCTGTTGGATGGAGCTTCT) and a 
reverse primer positioned at the start of the duplication 
(CGATGTAAAAAGACACAAGAGAAA), we successfully 
PCR-amplified this unique region in a carrier sample. The 
PCR did not amplify in noncarrier samples. Using capillary se- 
quencing, we identified the event as being a tandem duplication 
of 148 951 bp at chr20:9,684,767-9,833,717(hgl8) with a 
15 bp sequence inserted between the first and second copies of 
the repeated sequence (Fig. 2). 

Functional analysis 

Protein extraction 

Synaptoneurosome preparation was performed from full mouse 
brain at P8- 10 as described elsewhere with modifications (53). 
Briefly, brain tissue was glass/glass homogenized in ice-cold 
lysis buffer daily-fresh made until complete lysis. The protein 
extraction buffer contained 10 mM Tris Base, 0.1 m sodium 
chloride, 4 mM EDTA, 0.1% NP40 and was added with protease 
inhibitor cocktail (Roche Diagnostic). The homogenate was cen- 
trifuged at 1 OOOg x 1 0 min at 4°C in order to separate heavy cel- 
lular membranes. The resulting supernatant was passed through 
two 105 |JLm polypropylene mesh (Amazon, USA) and a 5 |xm 
nitrocellulose filter (Millipore) and finally centrifuged at 
1000 g x 15 min at 4°C. The obtained synaptoneurosomal- 
depleted supernatant contains cytoplasmic proteins and was 
collected in fresh tubes; while the pellet, which is enriched in 
synaptoneurosomal proteins, was resuspended in protein extrac- 
tion buffer. Protein concentration was measured by spectropho- 
tometer quantification (Nanodrop, ND8000) in both cytoplasmic 
and synaptoneurosomal-enriched extracts for further analysis. 

Western blot was carried out by standard procedures with the 
following specifications: protein samples were fractioned 
through 8 or 10% acrylamide gel and blotted onto PVDF mem- 
brane. The membrane was probed overnight with antibody 
anti-MAP2 (mouse monoclonal, Millipore, MAB3418, 
1:2500) and antibody anti-PSD95 (rabbit polyclonal, Cell Sig- 
naling Technology, 2507, 1:2500). Antibody anti-(3-actin 
(rabbit polyclonal, Cell Signaling Technology, 4967, 1:5000) 
was used as a loading control. After 16 h, three washes of 
5 min with 0.5% Tween/TBS were carried out followed by 1 h 
incubation with the following secondary antibodies: anti-mouse 
HRP-linked IgG (Cell Signaling Technology, 7076, 1:10 000) 
and anti-rabbit HRP-linked IgG (Cell Signaling Technology, 
7074, 1:10 000). After washing as above, ECL substrate 



(Millipore) was applied and chemiluminescent signal was 
revealed by exposing photographic film (Sigma, Z370398) to 
the membrane for ~ 5 min. 

Co-immunoprecipitation 

Co-immunoprecipitation procedure was adapted from Felli etal. 
(2005) (54) as described below. Synaptoneurosomal protein 
extracts were made from full brain tissue as described and 
PAK7 was immunoprecipitated. For both experiments, 
500 |jLl-volume samples containing 1000 (jug of synaptoneuroso- 
mal extracts were obtained by dilution in protein extraction 
buffer. Samples were immunoprecipitated with either 4 (jug anti- 
body anti-PAK7 (polyclonal goat, S-16, Santa Cruz Biotechnol- 
ogy, sc-22155) or 4 (jig antibody anti-DISCl (polyclonal goat, 
N-16, Santa Cruz Biotechnology, sc-47990). Normal goat 
IgGs (Sigma, 15256) were used as a negative control. After 
16 h, all samples were incubated with 25 julI of beads (protein 
A/G PLUS-Agarose, Santa Cruz Biotechnology, sc-2003) for 
4 h and washed. Samples were then resolved by western blotting 
as described, and probed with antibody anti-DISC 1 (rabbit poly- 
clonal antibody anti-DISCl Mid, Invitrogen, 40-6900, 1:1500) 
or anti-PAK7 (polyclonal rabbit anti-PAK7, Sigma, 
SAB3500335, 1:1000). Primary antibody incubation was fol- 
lowed by incubation with biotinylated anti-rabbit IgG (Vector 
Labs, BA-1000, 1:3000) and amplification with Vectastain 
ABC kit (Vector Labs, PK-6100). 

Free floating immunohisto chemistry 

For histological staining, brain tissue from mice at three different 
developmental stages was used: P7 (postnatal day 7) andP > 60. 
The brains, previously fixed in 4%paraformaldehyde/PBS and 
kept in 20%sucrose/PBS at 4°C, were sliced using a vibratome 
(Leica) to obtain 50 u,m-thick coronal sections. After 5 min in- 
cubation with 0.1% hydrogen peroxide, sections were washed 
in PBS, and moved for 1 h into a PBS blocking buffer containing 
10% goat serum and 3% Triton fortissue permeabilization. Sec- 
tions were then incubated for ~36 h in blocking buffer added 
with the following primary antibodies: anti-PAK7 (polyclonal 
goat, Santa Cruz Biotechnology, sc-22155, 1:750), anti-DISCl 
(polyclonal rabbit, Invitrogen, 40-6900, 1:500), anti-Synapsin 
1/2 (polyclonal guinea pig, Synaptic Systems, 106 004, 
1:1000). Primary antibody probing was followed by several 
PBS washes for a total time of 2 h. Sections were then moved 
to a secondary antibodies solution made of 0.05%Triton-X100 
and added with the following fiuorofore-conjugated antibodies: 
DyLight 488 donkey anti-goat IgG (Jackson ImmunoResearch 
Labs, 705-485-003, 1:500), DyLight 649 goat anti-rabbit IgG 
(Jackson ImmunoResearch Labs, 111-495-003, 1:500) and 
Cy3 goat anti-guinea pig IgG (Jackson ImmunoResearch Labs, 
106-165-003, 1:500). After 2h, sections were washed for 
15 min in PBS and mounted onto microscope slides using 
media containing DAPI (Vector H-1200). Co-immunostaining 
of PAK7 and DISC 1 was confirmed by confocal microscope 
imaging (Zeiss, x63 objective) in hippocampus and brain 
cortex fields at each considered age. 

SUPPLEMENTARY MATERIAL 

Supplementary Material is available at HMG online. 
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